
-------------------------------------------------------------------------------------------------
IMS R&D Focus data
-------------------------------------------------------------------------------------------------

This is proprietary data available for purchase from IMS Health. As of February 2013, the contact
at IMS is Barbara Doyle, BDoyle@us.imshealth.com.

Licensed users may download data via a web interface. While IMS does not delete historical records, 
it may update them, particularly following a merger (at which point the pre-merger firm names will be 
changed to that of the merged firm). Therefore data downloaded today may vary slightly from that used 
in the paper, although changes should not affect the variables used in the analysis. Variable definitions 
in the original dataset are contained in the document "Fields used in R&D Focus.pdf".

From the web interface:

1) Create 5 templates that specify a subset of variables as follows:
   Template_1: Preferred name, product name, company, nationality, latest phase, class code, class description, CAS number, action
   Template_2: Preferred name, product name, company, lead company, franchise company, franchise company nationality, franchise corporation, franchise corporation nationality, franchise relationship
   Template_3: Preferred name, product name, company, patentee, patent data, latest phase, development summary, science summary, mode of administration
   Template_4: Preferred name, product name, company, country status, country indication, country phase, launch country
   Template_5: Preferred name, product name, company, active program, estimated launch, indication, history, latest news, update date

2) Under the Search Criteria tab, choose "Class code" "is like" "X" (where X = A, B, C, ... for each ATC1 code) and then hit "Run Search"

3) Switch to the Report Build tab, and choose "Apply Template" "Template_N" (where N = 1 to 5).

4) Choose "Export," and then "Results List" "Format: CSV"
	Only 250 records can be exported at a time, so you will have to do this in blocks of 250 observations (note: this limit
	may have changed in subsequent releases).
	Rename the exported file ClassX_Set_N_File_i.csv where X corresponds to the ATC1, N corresponds to the template number, and i corresponds to the block of 250 records.

5) In addition to the csv export for template 5, also export "Records" "Format: HTML"
	Rename the exported file ClassX_History_i.html where X corresponds to the ATC1  and i corresponds to the block of 250 records.
	This is a necessary step for extracting the dates associated with each historical event for a drug project.

6) Create a folder for each ATC1 that contains all the csv and html files.

7) Concatenate the html files and execute convert_html and convert_spaces.

8) Run make_rd.sas to import data and ultimately create a dataset with disease-year observations.

-------------------------------------------------------------------------------------------------
Mortality data
-------------------------------------------------------------------------------------------------

This is public data available from the World Health Organization. The current web address is:

http://www.who.int/whosis/mort/download/en/

Please refer to the WHO website for a detailed description of this data.

The data used in the paper is contained in two ascii files provided, morticd9.csv and morticd10.csv, 
which were downloaded in April 2009. These WILL NOT match versions now available from the WHO, since 
the WHO revises these datasets periodically and specifically advises researchers to use the current version. 
From the WHO website:

"WHO asks users to cooperate in the provision of electronically transmitted data by adhering to the following guidelines:

   a. Material drawn from the MDB for publication must be accompanied by an acknowledgement of WHO as the source and a disclaimer 
crediting analyses, interpretations or conclusions to the author of the published data and not to WHO, which is responsible only for 
the provision of the original information.
   b. Users wishing to publish a technical description or qualification of the data will make a reasonable effort to ensure that 
it is not inconsistent with any published by WHO.
   c. Recipients of electronically transmitted data wishing to, or asked to make these, or copies thereof available to a third party 
are asked to refer such party to WHO, who will transmit the data directly accompanied with the necessary documentation. This will 
prevent circulation of out-of-date data, as the MDB is updated regularly."

To comply with point (c), I am not posting the outdated files used for the paper. If replication with the current data is not possible, 
I will make the earlier versions available for comparison.

Run make_mortality.sas to create a dataset with country-disease-year observations. Multiple imputation (using
PROC MI in SAS) was used because of the large number of missing values. 

-------------------------------------------------------------------------------------------------
Country-level data
-------------------------------------------------------------------------------------------------

Data on World Bank income classifications can be downloaded from the World Bank:

http://siteresources.worldbank.org/DATASTATISTICS/Resources/OGHIST.xls

I have reformatted this for easier importing as WB_classification.csv.

Data on country-level IPR policies is drawn from:
     a. The World Trade Organization, which provides information on the membership dates, required
     	TRIPS compliance, and self-declared income status for each member state. We collected this
	by hand.
     b. The Ginarte-Park Index, which was provided to us by Walter Park.
     c. The Hamden Index, which was provided to us by Intan Hamden-Livramento.

This information, along with some reformatting of country names for merging, is contained in
Country_IPR_data.csv.

-------------------------------------------------------------------------------------------------
Constructing the final dataset and running regressions
-------------------------------------------------------------------------------------------------

The program make_trips_data.sas merges all the input datasets and creates the disease-year data
used for regression analysis.

The program run_regressions.sas is self-explanatory and generates Tables 1-5. 
Note that because of the multiple imputation used for mortality data, it is necessary to use 
PROC MIANALYZE after PROC GENMOD to correct the standard errors.

